19 research outputs found

    Querying heterogeneous data in an in-situ unified agile system

    Get PDF
    Data integration provides a unified view of data by combining different data sources. In today’s multi-disciplinary and collaborative research environments, data is often produced and consumed by various means, multiple researchers operate on the data in different divisions to satisfy various research requirements, and using different query processors and analysis tools. This makes data integration a crucial component of any successful data intensive research activity. The fundamental difficulty is that data is heterogeneous not only in syntax, structure, and semantics, but also in the way it is accessed and queried. We introduce QUIS (QUery In-Situ), an agile query system equipped with a unified query language and a federated execution engine. It is capable of running queries on heterogeneous data sources in an in-situ manner. Its language provides advanced features such as virtual schemas, heterogeneous joins, and polymorphic result set representation. QUIS utilizes a federation of agents to transform a given input query written in its language to a (set of) computation models that are executable on the designated data sources. Federative query virtualization has the disadvantage that some aspects of a query may not be supported by the designated data sources. QUIS ensures that input queries are always fully satisfied. Therefore, if the target data sources do not fulfill all of the query requirements, QUIS detects the features that are lacking and complements them in a transparent manner. QUIS provides union and join capabilities over an unbound list of heterogeneous data sources; in addition, it offers solutions for heterogeneous query planning and optimization. In brief, QUIS is intended to mitigate data access heterogeneity through query virtualization, on-the-fly transformation, and federated execution. It offers in-Situ querying, agile querying, heterogeneous data source querying, unifeied execution, late-bound virtual schemas, and Remote execution

    SENS: Semantic Synthetic Benchmarking Model for Integrated Supply Chain Simulation and Analysis

    Get PDF
    Supply Chain (SC) modeling is essential to understand and influence SC behavior, especially for increasingly globalized and complex SCs. Existing models address various SC notions, e.g., processes, tiers and production, in an isolated manner limiting enriched analysis granted by integrated information systems. Moreover, the scarcity of real-world data prevents the benchmarking of the overall SC performance in different circumstances, especially wrt. resilience during disruption. We present SENS, an ontology-based Knowledge-Graph (KG) equipped with SPARQL implementations of KPIs to incorporate an end-to-end perspective of the SC including standardized SCOR processes and metrics. Further, we propose SENS-GEN, a highly configurable data generator that leverages SENS to create synthetic semantic SC data under multiple scenario configurations for comprehensive analysis and benchmarking applications. The evaluation shows that the significantly improved simulation and analysis capabilities, enabled by SENS, facilitate grasping, controlling and ultimately enhancing SC behavior and increasing resilience in disruptive scenarios

    Towards Semantic Integration of Federated Research Data

    Get PDF
    Digitization of the research (data) lifecycle has created a galaxy of data nodes that are often characterized by sparse interoperability. With the start of the European Open Science Cloud in November 2018 and facing the upcoming call for the creation of the National Research Data Infrastructure (NFDI), researchers and infrastructure providers will need to harmonize their data efforts. In this article, we propose a recently initiated proof-of-concept towards a network of semantically harmonized Research Data Management (RDM) systems. This includes a network of research data management and publication systems with semantic integration at three levels, namely, data, metadata, and schema. As such, an ecosystem for agile, evolutionary ontology development, and the community-driven definition of quality criteria and classification schemes for scientific domains will be created. In contrast to the classical data repository approach, this process will allow for cross-repository as well as cross-domain data discovery, integration, and collaboration and will lead to open and interoperable data portals throughout the scientific domains. At the joint lab of L3S research center and TIB Leibniz Information Center for Science and Technology in Hanover, we are developing a solution based on a customized distribution of CKAN called the Leibniz Data Manager (LDM). LDM utilizes the CKAN’s harvesting functionality to exchange metadata using the DCAT vocabulary. By adding the concept of semantic schema to LDM, it will contribute to realizing the FAIR paradigm. Variables, their attributes and relationships of a dataset will improve findability and accessibility and can be processed by humans or machines across scientific domains. We argue that it is crucial for the RDM development in Germany that domain-specific data silos should be the exception, and that a semantically-linked network of generic and domain-specific research data systems and services at national, regional, and organization levels should be promoted within the NFDI initiative

    Data lifecycle is not a cycle, but a plane!

    Get PDF
    Most of the data-intensive scientific domains, e.g., life-, natural-, and geo-sciences have come up with data life cycles. These cycles feature, in various ways, a set of core data-centric steps, e.g., planning, collecting, describing, integrating, analyzing, and publishing. Although they differ in the steps they identify and the execution order, they collectively suffer from a collection of short-comings. They mainly promote a waterfall-like model of sequentially executing the lifecycles’ steps. For example, the lifecycle used by DataOne suggests that “analyze” happens after "integrate". However, in practice, a scientist may need to analyze data without performing the integration. In general, scientists may not need to accomplish all the steps. Also, in many cases, they simply jump from, e.g., "collect" to "analyze" in order to evaluate the feasibility and fitness of the data and then return to "describe" and "preserve" steps. This causes the cycle to gradually turn into a mesh. Indeed, this problem has been recognized and dealt with by the GFBio and USGS data lifecycles. The former has added a set of direct links between non-neighboring steps to allow shortcuts, while the later has factored out cross-cutting steps, e.g., "describe" and "manage quality" and argued that these tasks must be performed continually across all stages of the lifecycle. Although aforementioned lifecycles have realized these issues, they do not offer customization guidelines based on, e.g., project requirements, resources availability, priority, or effort estimations. In this work, we propose a two-dimensional Cartesian-like plane, in that the x- and y-axes represent phases and disciplines, respectively. A phase is a stage of the project with a predefined focus that that leads the work towards achieving a set of targeted objectives in a specific timespan. We identify four phases; conception, implementation, publishing, and preservation. Phases can be repeated in a run, and do not need to have equal timespan. However, each phase should satisfy its exit criteria to be able to proceed to the next phase. A discipline, on the vertical axis, is a set of correlated activities that, when performed, makes a measurable progress in the data-centric project. We have incorporated these disciplines: plan, acquire, assure, describe, preserve, discover, integrate, analyze, maintain, and execute. An execution plan is developed by placing required activities in their respective disciplines’ lanes on the plane. Each task (activity instance) is visualized as a rectangle that its width and height respectively indicate the duration and effort estimation needed to complete it. The phases, as well as the characteristics of the project (requirements, size, team, time, and budget), may influence these dimensions. It is possible for a discipline or an activity to be utilized several times in different phases. For example, a planning activity gains more weight in conception and fades out over the course of the project, while analysis activities start in mid-conception, get full focus on implementation, and may still need some attention during publishing phases. Also, multiple activities of different disciplines can run in parallel. However, each task's objective should remain aligned according to the phase’s focus and exit criteria. For instance, an analysis task in the conception phase may utilize multiple methodologies to perform experimentation on a small sample of a designated dataset, while the same task in the implementation phase conducts a full-fledged analysis using the chosen methodology on the whole datase

    Research data management with BEXIS 2 – An overview and introduction to the special session

    Get PDF
    In this talk, we will introduce the data management platform BEXIS 2 and provide an overview of its features and capabilities. We will demonstrate how BEXIS 2 can support researchers in managing their data throughout the different aspects of the data life-cycle. Since BEXIS 2 has been designed for large collaborative projects with a central data management, we will also address features relevant to decision makers and system administrators

    BEXIS Tech Talk #4 (28.04.2016): The 3rd Party Libraries

    No full text
    This talk is part of the BEXIS Tech Talk Series recorded to share the knowledge needed for developing the BEXIS 2 data management platform. So the talk is geared towards software engineers and developers. BEXIS 2 is developed as part of the BExIS++ project funded by DFG (http://bexis2.uni-jena.de

    BEXIS Tech Talk #3 (22.03.2016): The System Architecture

    No full text
    This talk is part of the BEXIS Tech Talk Series recorded to share the knowledge needed for developing the BEXIS 2 data management platform. So the talk is geared towards software engineers and developers. BEXIS 2 is developed as part of the BExIS++ project funded by DFG (http://bexis2.uni-jena.de

    BEXIS Tech Talk #7 (28.10.2016): Configuration and Change Management

    No full text
    This talk is part of the BEXIS Tech Talk Series recorded to share the knowledge needed for developing the BEXIS 2 data management platform. So the talk is geared towards software engineers and developers. BEXIS 2 is developed as part of the BExIS++ project funded by DFG (http://bexis2.uni-jena.de

    BEXIS Tech Talk #5 (14.06.2016): The Extensibility of BEXIS 2

    No full text
    This talk is part of the BEXIS Tech Talk Series recorded to share the knowledge needed for developing the BEXIS 2 data management platform. So the talk is geared towards software engineers and developers. BEXIS 2 is developed as part of the BExIS++ project funded by DFG (http://bexis2.uni-jena.de
    corecore